Skip to content

Conversation

@jonathan-buttner
Copy link
Contributor

This PR switches two IllegalStateException to be ElasticsearchStatusException that explicitly return a 400. This is for the text expansion query builder and the text embedding query vector builder classes.

IllegalStateException gets translated into a 500 error response which does not represent what actually occurred. This would arise when a user attempts to perform a text expansion query or text embedding vector query when the index mapping is not configured to support that embedding type.

@jonathan-buttner jonathan-buttner added >bug :ml Machine learning Team:ML Meta label for the ML team v9.2.0 labels Oct 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @jonathan-buttner, I've created a changelog YAML for you.


// This is a hack so that we can control when the client returns valid and invalid results
// to test both the success and failure paths
private final AtomicBoolean shouldProduceDenseVectorResults = new AtomicBoolean();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class inherits a lot of functionality. It's unclear to me how to create a test that relies on the results being invalid without affecting all the other tests that will run and depend on simulateMethod. If you have ideas I'm happy to make changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A yaml test would be more appropriate here imo. We need an end to end test

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this @jonathan-buttner
I left some comments

+ modelId
+ "] a compatible model?"
+ "] a compatible model?",
RestStatus.BAD_REQUEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use an IllegalArgumentException similar to what we do in SparseVectorQueryBuilder?

+ modelId
+ "] a text embedding model?"
+ "] a text embedding model?",
RestStatus.BAD_REQUEST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here?


// This is a hack so that we can control when the client returns valid and invalid results
// to test both the success and failure paths
private final AtomicBoolean shouldProduceDenseVectorResults = new AtomicBoolean();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A yaml test would be more appropriate here imo. We need an end to end test

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

allowed_warnings:
- "text_expansion is deprecated. Use sparse_vector instead."
- match: { error.root_cause.0.type: "illegal_argument_exception" }
- match: { error.root_cause.0.reason: "expected a result of type [text_expansion_result] received [text_embedding_result]. Is [dense-inference-id] a compatible model?" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we need to use a semantic text field to get this error? That feels unrelated since it's not even used in the query.
We can also move it to resources/rest-api-spec/test/ml/search_knn_query_vector_builder.yml ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, I'll move it and remove the dependency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I can't seem to get it to work with a regular dense vector field:

XPackRestIT > test {p0=ml/search_knn_query_vector_builder/Text expansion query against semantic_text field using a dense vector model returns an failure} FAILED
    java.lang.AssertionError: Failure at [ml/search_knn_query_vector_builder:112]: expected [400] status code but api [search] returned [403 Forbidden] [{"error":{"root_cause":[{"type":"status_exception","reason":"Trained model [text_embedding_model] is configured for task [text_embedding] but called with task [text_expansion]"

I'm going to leave it in the semantic text tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like another bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should return a 400 in that situation too instead of a 403? Yeah I typically would think the 403 would be for permissions issues.

@jonathan-buttner jonathan-buttner marked this pull request as ready for review October 2, 2025 13:29
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@jonathan-buttner jonathan-buttner enabled auto-merge (squash) October 2, 2025 13:31
@jonathan-buttner jonathan-buttner merged commit 5da01ac into elastic:main Oct 2, 2025
34 checks passed
@jonathan-buttner jonathan-buttner deleted the ml-fix-text-expansion-query branch October 2, 2025 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :ml Machine learning Team:ML Meta label for the ML team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants